Integrated multidimensional database

ABSTRACT

A method of distributing research data from a common database to a user of the common database is provided. Data concerning research results and data upon which the research results are based are stored in a local database and are linked to each other. Data concerning research results and data upon which the research results are based are selectively extracted from the local database to the common database. Research data are then selected by a user of the common database from the extracted data concerning research results and from the data upon which the extracted data are based and the selected research data are distributed to the user.

CLAIM OF PRIORITY

[0001] This application is a continuation application and claimspriority under 35 USC § 120 to U.S. patent application Ser. No.09/727,594, filed on Dec. 4, 2000, and to U.S. Patent Application SerialNo. 60/181,227, filed on Feb. 9, 2000, the entire contents of both ofwhich are incorporated by reference.

TECHNICAL FIELD

[0002] This invention relates to databases, and more particularly tointegrated multidimensional databases for managing scientific researchdata.

BACKGROUND

[0003] Researchers have performed experiments on and made observationsof biological tissue samples suggesting that a molecular basis forcancer and other diseases might be discovered through careful molecularanalysis of such tissues. Such an understanding could permit improvementin the diagnosis, screening, and treatment of disease, and could permitdisease treatment to be tailored to the specific molecular defects foundin an individual patient. Many different researchers and laboratoriesstudy the molecular basis of disease and a large amount of data andinformation is produced from such studies. Optimization of the handlingand integration research results, data, and other information produceand used by various laboratories devoted to studying the molecular basisof cancer and other tissue-based diseases is advantageous for realizingimprovements in the understanding and treatment of disease.

[0004] Even though many large genomic warehouse databases currentlyexist, and even though scientific laboratories are connected to theInternet, the data produced by a lab are not necessarily well handled,integrated, validated, searchable, and useable either by the labproducing the data or by another lab that might be interested in usingthe data. Generally, when data from biological tissue studies arepublished, only a limited set of the actual primary data (and sometimesnone of it) are available for review and reanalysis. Moreover, commonlanguage and reference points are often not used for reporting the data.Even in the lab that did the original work, there is often no efficientor robust way to integrate data from a study with previous or subsequentstudies. Furthermore, because of space limitations and the difficulty oftracking complex research methods, many published descriptions oflaboratory research methods do not provide adequate information foranother scientist to accurately reproduce an experiment, even thoughthis is a central tenet of scientific publication. The end result isthat when taken as a group, the many similar or related studies, whileindividually illuminating, are isolated and autonomous from each other,and do not achieve potential synergies.

[0005] Poor data handling may result in major problems that may slow orpossibly prevent real progress in finding better treatment anddiagnostic methods for major diseases. In particular, current methods ofdisseminating information from molecular studies of cancer and otherdiseases do not allow results from one study to be easily integratedwith results from other studies. There is no standard way to link theresults of DNA, RNA, and protein-based studies to cellular function orphenotype expression. Current methods of dissemination of the results ofmolecular studies do not allow preservation of a substantial portion ofthe original data supporting such studies, making it difficult forresearchers to verify the conclusion of a research study or otherwisereinterpret the data.

SUMMARY

[0006] In one aspect, generally, a method of distributing research datafrom a common database to a user of the common database is provided.Data concerning research results and data upon which the researchresults are based are stored in a local database, with research resultslinked to the data upon which the research results are based. Dataconcerning research results and data upon which the research results arebased are selectively extracted from the local database to the commondatabase. Research data are then selected by a user of the commondatabase from the extracted data concerning research results and fromthe data upon which the extracted data are based and the selectedresearch data are distributed to the user.

[0007] Implementations may include one or more of the followingfeatures. For example, when the research data are distributed, the dataconcerning research results and the data upon which the research resultsare based are distributed in a defined database table structure. Thedistribution of research data may include giving a reviewer electronicaccess to the data concerning research results and to the data uponwhich the research results are based. The approval of the reviewer maybe required before the research data are publicly distributed.

[0008] The data upon which the research results are based may includephenotype data and genotype data. The data upon which the researchresults are based can include information concerning equipment andsupplies used in generating the research results, or informationconcerning biomaterials used in generating the research results.

[0009] Information concerning protocols used in generating the researchresults may be stored in the local database. The information concerningprotocols used in generating the research results may be linked to thedata concerning research results and to the data upon which the researchresults are based. Information concerning protocols used in generatingthe research results may be selectively extracted from the localdatabase to the common database. Research data selected by a user of thecommon database from the information concerning protocols used ingenerating the research results may be distributed to the user.

[0010] The data upon which the research results are based may includeinformation concerning equipment and supplies used in generating theresearch results, and may include information concerning biomaterialused in generating the research results.

[0011] In another general aspect, a system for distributing researchdata may include a processor, an output device for viewing the researchdata, and memory for storing instructions performed by the processor.The memory includes instructions for storing data concerning researchresults in a local database, storing data upon which the researchresults are based in the local database, and linking the data concerningresearch results to the data upon which the research results are based.The memory also includes instructions for selectively extracting dataconcerning research results and data upon which the research results arebased from the local database to the common database and fordistributing to a user of the common database research data selected bythe user from the extracted data concerning research results and thedata upon which the extracted data are based.

[0012] The memory of the system can also include instructions forstoring information concerning protocols used in generating the researchresults in the local database, and for linking the informationconcerning protocols used in generating the research results to the dataconcerning research results and to the data upon which the researchresults are based. The memory can include instructions for selectivelyextracting information concerning protocols used in generating theresearch results from the local database to the common database, and fordistributing to a user of the common database research data selected bythe user from the information concerning protocols used in generatingthe research results.

[0013] In another general aspect, a computer program, residing on acomputer-readable medium, for distributing research data includesinstructions for causing a computer to store data concerning researchresults in a local database, store data upon which the research resultsare based in the local database, and link the data concerning researchresults to the data upon which the research results are based. Theprogram includes instructions for selective extraction of dataconcerning research results and data upon which the research results arebased from the local database to the common database, and fordistributing to a user of the common database research data selected bythe user from the extracted data concerning research results and thedata upon which the extracted data are based.

[0014] Other features and advantages will be apparent from thedescription and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

[0015]FIG. 1 is a schematic diagram of the relationships between locallymaintained databases and a common globally accessible database.

[0016]FIG. 2 is an exemplary screen shot of a front page of a databaseapplication.

[0017]FIG. 3 is an exemplary screen shot of a template for managing adatabase application.

[0018]FIG. 4 is an exemplary screen shot of search criteria used in atemplate for managing a database application.

[0019] FIGS. 5-8 are exemplary screen shots of templates foradministering a database application.

[0020]FIG. 9 is an exemplary screen shot of a template for searching andstoring contact information in a database application.

[0021]FIG. 10 is an exemplary screen shot of a template for searchingand displaying contact information stored in a database application.

[0022]FIGS. 11 and 12 are exemplary screen shots of templates forsearching and storing equipment and supply information in a databaseapplication.

[0023]FIG. 13 is an exemplary screen shot of a template for printingbarcodes for equipment and supply from a database application.

[0024]FIGS. 14 and 15 are exemplary screen shots of templates forsearching and storing biomaterials information in a databaseapplication.

[0025]FIG. 16 is an exemplary screen shot of a template for storingbiomaterials information in a database application.

[0026]FIG. 17 is an exemplary screen shot of a template for recordingprotocol information in a database application.

[0027]FIG. 18 is an exemplary screen shot of a template for recordingand editing protocol information in a database application.

[0028]FIGS. 19 and 20 are exemplary screen shots of templates forcreating and displaying protocol information in a database application.

[0029]FIG. 21 is an exemplary screen shot of a template for searchingand displaying publicly know genetic information in a databaseapplication.

[0030]FIG. 22 is an exemplary screen shot of a template for inputtinggenetic information in a database application

[0031]FIG. 23 is a schematic diagram of the protocols linking differentkinds of research data.

[0032]FIG. 24 is a block diagram of a computer system.

[0033] Like reference symbols in the various drawings indicate likeelements.

DETAILED DESCRIPTION

[0034] 1. Overview

[0035] A multi-user, computer-implemented database (DA) allowslaboratory (lab) researchers to plan, implement, manage, track, review,and interpret research within the lab as the research occurs. The DApermits the electronic publication of selected data and all supportinginformation, images, and annotations in database format in a searchablepublication database. The data in the publication database may beaccessed and used by subscribers to the database or by the generalpublic.

[0036] The DA allows automatic integration of complex molecular researchdata on both a single patient level and a multiple patient level, andprovides templates within which molecular data from multiple sources andof various types may be summarized in a fashion that takes advantage ofhuman and computerized pattern recognition abilities. When used inmolecular research studies, the DA highlights relationships betweengenotypes and phenotypes. The DA database application additionallyallows users from different institutions and laboratories to comparemolecular data from various studies in a highly organized way, to verifythe results of existing studies based on the underlying fundamentalresearch data, and to use fundamental research data from existingstudies in new studies and to answer different questions than wereanswered in the studies for which the research data were collected.

[0037] The organization of the database is such that the organism understudy (human, animal, or other) is central, and each step of theresearch process is related back to the organism as research occurs.Each step of the process is logically dependent and fully communicativewith previous steps. So, for example, to be used in a gene sequencingreaction, a specific tissue reagent (such as a purified DNA sample) isgiven a unique identifier in the database that is programmaticallyrelated to its antecedents (e.g., the tissue block from which itoriginated, the organ from which it originated, and the patient fromwhich it originated), so that it can be related to all previous andsubsequent research involving that tissue reagent. Thus, links betweengenotype and phenotype are created and maintained.

[0038] Although the initial intended use of this DA is for cancerresearch and related activities, it may be used similarly in support ofany body-tissue or body-fluid-based research where biomolecular,clinical, and pathology data need to be integrated, or to plan, manage,organize, display, and interpret data generally. The DA may beimplemented on any computer system, including a networked system ofcomputers in which the DA is implemented on the Internet. The DA may beaccessible to users through any computer-implemented interface includinga web browser interface.

[0039] Each laboratory using the database application maintains its owncopy of the DA including the front-end scripts and the back-end databaseshell and has full and exclusive control over what is entered andaccessible within its copy of the database. Thus, each laboratory hasits own “franchise” to use the database technology and may use thetechnology as a “palette” with which to organize, test, and keep arecord of laboratory research without supervision and in full privacyfrom laboratory outsiders. Each laboratory using the DA may allowselective access (e.g., via the Internet) to all or portions of itsdatabase records to other laboratories or individuals. Thus, a networkof collaborating laboratories and their databases is enabled.

[0040] When a researcher is ready to make data available to a wideraudience, the researcher performs a query on his or her laboratorydatabase to collect the data to be reported. The results of this query(with the tables it draws upon and the specific contents specified bythe query) are extracted from the database as a deliverable entity. Thiscorpus of information may be known as a “datamorph” and may include adeeply referenced and detailed version of the information often includedin the “Materials and Methods” and “Results” sections of paper andelectronic scientific publications. The datamorph, however, may includeall phenotype base data and all laboratory protocols used to extract andinterpret data from a tissue sample in the query result, as contrastedwith an abbreviated description of materials, methods, and selectedresults used to support a scientific conclusion. All of the results datacontained in the datamorph may be linked to accessible data image(s)(e.g., images of cells from a tissue sample or electropherogram images)from which the results are obtained. The researcher may write anintroduction and discussion to accompany this “datamorph,” and may addappropriate database and literature references. The researcher may thenelectronically transmit the entire completed datamorph package,including the discussion and reference lists and/or links, to apublication database, where it may be accessible to database usersimmediately or where it may be placed in a temporary area for review.

[0041] The publication database supports data from different laboratorydatabases and is created and maintained by the same databaseadministrative body that creates the individual laboratory copies of theDA. If the datamorphs are reviewed and/or edited before being madeaccessible, the editor of the publication database may give confidentialand time-limited access to a completed datamorph to appropriatereviewers who then review the datamorph. The editor may give such accessby providing the editors with Uniform Resource Locator (URL) links andpasswords to view the datamorph in an automated or manual way. Becausethe database and datamorphs are accessible to remote users andreviewers, a reviewer may complete a review from a remote location andhave the completed review forwarded to an editor's attention in anautomated fashion. Based on these reviews, an editor may decide whetherto obtain more reviews, return the datamorph back to the sender forrevision, or enter the datamorph into the publication database forviewing. If the datamorph is entered into the publication database, thecompleted datamorph is given a specific unique completed datamorph idthat allows users of the DA to search the datamorph as a singularentity, in addition to searching portions of the datamorph as part ofthe entire corpus of published database material.

[0042] At the discretion of the editor and the submitter of thecompleted datamorph, access to specific datamorphs may be controlled sothat certain users (e.g., other submitters to the database or financialsupporters of the database) are provided earlier access to a specificdatamorph. For example, if a completed datamorph is accepted forpublication in the database on a specific date, other researchers whohave submitted accepted datamorphs to the database within a certain timeperiod prior to this submission may be given immediate access to thenewly published datamorph. Subscribers to the database may be givenaccess to the datamorph on a certain date after initial acceptance, andthe general public may be given full access at a later date. An abstractof the findings, without full searchable access to the underlying data,may be made available to the general public on an earlier date than fullsearchable access becomes available.

[0043] Other selected users may have direct access to database data. Forexample, patients or research participants may be given access to datagenerated from their participation in a study, which may encouragepatient participation in research. Patients may particularly benefitfrom such access when a molecular analysis of tissue samples isperformed in order to predict phenotype behavior based on genotypecharacteristics. Since patients with specific molecular levelcharacteristics may respond better to specific chemotherapeutic or othertreatments than other patients without their specific molecular levelcharacteristics, if a patient is given the opportunity to view andcompare his or her own tissue sample data to data from other patients,appropriate treatments for the patient may be identified from the otherpatients' data.

[0044] The database front-end scripts and back-end database structuresare compatible among different labs' copies of the database, and eachlab's copy of the database is compatible with the publicationdatabase(s). Database compatibility is maintained by preventing usersfrom changing the back-end or front-end structure of the databaselocally, or by giving users only limited freedom to make such changes.Thus, all copies of the database application are equal in form and ableto communicate on an automated basis. Where appropriate, XML (ExtensibleMarkup Language), SGML (Standard Generalized Markup Language) or otherimplementation languages may be used in a browser-based implementationof the DA with appropriate style sheets on the front-end to allowinteroperability between the database structure and other data formatsused by a laboratory. Although XML provides enhanced interoperability,communication through XML allows different users to interact with thedatabase through different front-end and back-end structures and permitsthe collection, input, and interpretation of data to occur in differentenvironments, and therefore may increase variability in data validationat each laboratory location. Management of the databases is streamlinedfrom the database administrator's perspective, however, if users cannotlocally change the database structure or use of style sheets.

[0045] Whether or not front-end style sheets are enabled, individuallaboratories are not able to change the front-end or back-end databasestructure (as opposed to its data contents, over which they have fullcontrol) on their own, or have only limited capacity to do so. To makethis lack of local control over back-end and front-end structurepalatable in a research environment, the authority administering thedatabase structure must be highly responsive to users' needs as theyarise, and rapid automated updating of both back-end and front-endstructures is necessary. To this end, and to encourage user involvementin the development and improvement of the database application, theauthority responsible for the database backend and front-end structuresmay optionally make the front-end scripts and back-end databasestructures fully viewable by a user, but unchangeable in the user'sregistered copy of the front-end and back-end scripts. The user may bepermitted to create a temporary copy of the database structures wherenew ideas for front-end scripting or back-end structure can be testedlocally. Protections may be built into the software to prevent the userfrom copying the registered database or the temporary database andproviding it to anyone other than the authority in charge of thetemplate database front-end and back-end structure. The databaseadministrator may provide the user with a clear path to submit suggestedimprovements to the database, and if a suggestion is accepted and/orincorporated into the database the user may receive a reduction insubscriber fees or other compensation. Disagreements over neededfunctionality that cannot be agreed upon through discussions may be putup for a vote among users, and/or may be decided by a board of advisors.

[0046] Some of the relationships between the template database, thelaboratory databases, and the publication database are shownschematically in FIG. 1. A database template 102 is maintained by theinstitute or other authority charged with its maintenance andimprovement. One or more laboratories use their own copies of thisdatabase 104 and contribute to a publication database 106, which mayalso be accessible to non-contributing subscribers using their owncopies of this database 108. The solid lines between the laboratorydatabases and the publication database symbolize the network ofcommunication among the laboratories and other database users via thepublication database. The two-headed solid arrows symbolize the transferof information from an individual lab to the publication database in theform of a completed datamorphs and the retrieval of information from thepublication database by the laboratory. Retrieval of information bynon-contributing users 108 is symbolized by a one-headed arrow.Two-headed dotted arrows symbolize communication between the databaseusers and the database administrator to develop and improve the databaseapplication.

[0047] 2. Components of the Database

[0048] The database application may include multiple templates toreceive, organize, manipulate, and present scientific research data. AProject Manager template may be used for storing, organizing, andaccessing data related to the DA's internal functionality. A Patientstemplate may be used for storing, organizing, and accessing data relatedto patients, animals, plants, organisms, and research subjectsgenerally. A Contact Manager template may be used for storing,organizing, and accessing data related to people, companies, andorganizations in contact with a laboratory, and to keep track ofspecific laboratory projects. An Equipment and Supplies template may beused for storing, organizing, and accessing data related to equipmentand supplies used during laboratory research. A Biomaterials templatemay be used for storing, organizing, and accessing data related totissue, fluids, and other biomaterials used during research. A Protocolstemplate may be used to for storing, organizing, and accessing data thatrelate to connections, links, and relationships among the aforementioneddata, and may be used to analyze existing data and to add new data tothe database, while maintaining existing connections, links, andrelationships. A Scientific Query Builder template may be used forextracting information from the stored data. These templates mayinteract with each other so that links between data in the same anddifferent templates may be established by and displayed to users of thedatabase. Referring to FIG. 2, when implemented in a web browserenvironment (e.g., Microsoft's Internet Explorer™), the functionality ofthe templates may be accessible to a user through links to the templates202 from a front page 200 of the DA.

[0049] Project Manager Template

[0050] The project manager template provides a place for laboratorymembers to maintain data related to, for example, projects, projectstatus, people involved in a project, and records of meetings. Referringto FIG. 3, the template provides input fields for information relatingto a project, including name 300, category 302, function 304, comments306, status 308, priority 310, completeness 312, identification numbers314, error prevention mechanisms 316, internal laboratory contributors318, external contributors 320, and related fields 322. Referring toFIG. 4, existing projects that have been entered in the DA can besearched and sorted using one or more of the input fields 400 for theproject, including, for example, the project status and the personinvolved with the project. Hyperlinks to different projects may bedisplayed and different projects may be sorted by project category 402and further sorted by project name 404.

[0051] Referring to FIG. 5, the project manager template may also beaccessed and modified by a local laboratory database administratorthrough an “Admin” area 500. A local administrator of the database maycontrol user access to specific applications, portions of applications,and types of data accessible in the DA. For example, as shown in FIG. 5,a user who is a member of the project developer list 502 may have accessto different applications than a user who is a member of the computerinformation list 504. Each laboratory deploying a copy of the templatedatabase may designate an administrator responsible for controllingaccess to the DA by users in the laboratory. The administrator may addor remove a user 506 of the lab database from lists of users 508 whohave access to particular applications.

[0052] Referring to FIGS. 6-8, the project manager template also allowsusers with special access to the DA (e.g., a system-wide databaseadministrator) to see the names of the tables 602 within the database. Aparticular table 604 may be selected (e.g., by clicking on a hyperlinkto the table) to display the names 704 and definitions 706 of all theattributes 708 within the table, and the location and names of front-endscripts. Specific attribute definitions of a particular table may beviewed and modified using input fields 802. These functions are retainedin the master template database, but are “read only” in the templatedatabase provided to laboratories and other non-special users of the DA.Viewing table structures provides a useful reference for users who aretrying to understand how data are handled within the database in orderto identify ways of improving performance, even if those users do nothave authority to modify the table structures. A system-wide databaseadministrator may design and control help messages for this template,which are “read only” in template databases provided to users, but arefully functional in the master template database.

[0053] Patients

[0054] The Patients template enables efficient collection of appropriateresearch data from patients and other research subjects and specimens,and encodes the data in a way that permits, when searching the data,telescoping of the data from broad-based data arrays to intermediatearrays to single data points.

[0055] Using the Patients template, a clinical coordinator, a physician,or a nurse may manage patient contact information, informed discussionof studies and consent procedures, and collection of clinical dataneeded for research purposes consenting patients. Data collected mayvary according to the study for which they are used, and the templatemay be adapted to any study or any group of patients. This functionsupports not only the collection of specific data important forresearch, but also supports tracking the process of collection of thesedata. Because collection of patient data often entails contactingnumerous physicians' offices, hospitals, and clinics, transferringappropriate consent forms, receiving records, abstracting data from therecords, and inputting data into the database, tracking each of thesesteps through the DA allows study coordinators and directors to improveboth the quality and the efficiency of patient data management.

[0056] The Patients template may be built to receive and manage data on,for example, prostate cancer patients who give informed consent toparticipate in a metastatic prostate cancer study. Categories of datacollected for such a study may include many that are relatively generic,such as exposure history, occupational history, dietary history, pastmedical and surgical history, weight, age, family history, andethnicity. Prostate cancer specific data, such as serum PSA (prostatespecific antigen) values and Gleason histological grading data may alsobe collected. Once the data to be collected and tracked for anindividual study are defined, the database template allows easyimportation from other databases.

[0057] Clinical data may be stored using methodology supportingtelescoping from broad-based data arrays to single data points, andsuitable for use with pattern recognition algorithms. As an example ofclinical data telescoping, a patient's alcohol exposure history may becollected from a patient's answers to a questionnaire and displayed ingranular format separating exposure to wine, beer, and liquor andtracking the source of all data stored in the DA. The data may bedisplayed as a graphic depicting exposure for a patient's year-by-yearexposure over his lifetime. This lifetime exposure graphic then may befurther condensed to a single color-coded dot depicting his highestexposure level for any one year. Thus, the patient's entire exposurehistory is telescoped from the highest level of summarization (the dot)to the summary for all types of liquor (the exposure graphicrepresenting exposure for all types of liquor) to the separate graphicsfor each type of liquor, and so on down to the patient's answers to eachquestion on the questionnaire. By encoding the data and allowingconvenient viewing of underlying data in this manner, a discovery toolbased on innate human pattern recognition abilities is enabled.Machine-based pattern recognition is also enabled.

[0058] For research animals or other non-human organisms under study, aseparate link for management of pertinent data relating to them islocated adjacent to the link to “patients.” Data relating to researchinvolving laboratory animals and other organisms is managed similarly tothat of patient data.

[0059] Contact Manager

[0060] Referring to FIG. 9, the Contact Manager template allows a userto record and/or search contact information for people, institutions,laboratories, suppliers and manufacturers related to laboratory research902 using a variety of search criteria 904, including name, address, idnumber, and zip code. Relationships between people and these otherentities are maintained in the contact manager template of the DA, sothat the DA may be used not only to perform standard text searches ofspecific fields, but also searches for people affiliated with a certaininstitution A or for, for example, all the people working in laboratoryA. The Contact Manager template also maintains relationships betweenentities within an organization, so that a tree of relationships can bedisplayed. Referring to FIG. 10, for example, within an academic medicalcenter, a hierarchical display of the hospital names 1002, departmentnames 1004, and the labs and other offices 1006 within each may beprovided with a hyperlink to the phone number, email address, webaddress, or other contact information for each entity. A similarhierarchy of information is maintained for each address and contact, sothat the user can display and search the data types as a series ofhierarchical links if desired, with the user clicking on the specificpoint in the hierarchy desired to obtain information about that point.The information in the Contact Manager may be directly downloadable topersonal digital assistants.

[0061] Equipment and Supply

[0062] The Equipment and Supply template allows a user to search andrecord data related to equipment, supplies, and reagents used in thelaboratory. As used in the DA, equipment is anything purchased orotherwise obtained by the lab and that may be used multiple times in thelab, while a supply is generally nonreusable. A reagent is anythingcreated within the laboratory, using supplies and equipment in the lab.A tissue reagent is a reagent that derives from an organism, and anon-tissue reagent is any reagent that does not derive from an organism.

[0063] Data may be input to the template when an item is purchased, whenit is received, or at a later time, and may include the name,manufacturer, supplier, serial number, cost and budget data, storagelocation, and other data related to any equipment or supply used in thelaboratory. Referring to FIG. 11, the template provides input fields1102 for a variety of information relating to equipment and supplies andfields for a variety of search criteria 1104 that may be used toretrieve and organize information in the database.

[0064] Equipment and supplies may be categorized as “Types” and“Instances.” Unique types of equipment or supplies from a singlemanufacturer may have a single entry in the database. Each example of acertain type may be recorded as an instance of that item. For example, aRainin P-10 manual dispensing pipette may be designated as an equipmenttype produced by a specific manufacturer, with a specific model number.Each of four Rainin P-10 pipettes used by the lab may then be entered inthe template as separate instances of that type, with data on, forexample, supplier, cost, and storage location stored separately for eachinstance. A user may search the database attributes recorded for eachitem type and instance. Referring to FIG. 12, the template may be usedto facilitate reordering items that are ordered and used repeatedly in alab. The user may create a new instance of an item in the template 1202,which copies all the retained data from the previous instance, andprompts the user to specify the price, supplier, and other necessaryinformation for the new instance 1204. Method of payment, includingbudget numbers, are collected so that the laboratory may to track costsin any way needed for total lab management.

[0065] Referring to FIG. 13, the DA may be barcode enabled to allow auser to print a standard barcode encoded label 1302 or to scan in abarcoded item 1304 that subsequently may be automatically located in thedatabase upon entry of the barcode identification number.

[0066] To facilitate management of computer resources by networkadministrators, the Equipment and Supply template may also collect andtrack computer related data as data are accessed from and transmittedamong different laboratory computers connected to a network. A link atthe bottom of the right frame of equipment/supply displays computerrelated data for the laboratory and specific data for each computer canbe obtained by clicking on its name there or when it appears as a resultof searches using the equipment/supply query engine.

[0067] Biomaterials

[0068] The biomaterials template allows laboratories to collect andtrack data related to tissues, which are usually stored as tissue blockseither frozen or embedded in paraffin, and body-fluids that are obtainedfor study purposes. Each tissue block may be related to its source(patient or other organism), its storage type and location, physicalproperties, such as the time it was placed in fixative or frozen,species origin, procedure origin, anatomic site of origin, anatomicorientation if available, and whether or not it contains tumor or otherpathology. When retrieving biomaterials data from the database a usercan select the data parameters desired for viewing tissue or body-fluiddata, and when searching the database for biomaterials data the user maysearch different fields that identify the tissue or body-fluid data.

[0069] The DA also makes use of a unique Sentinel grading system withwhich investigators can categorize tissue blocks on a 1-4 scale. Manystudies have a large number of tissue blocks from which to preparesamples for a particular study, and the Sentinel grading system providesa method for marking particular blocks as the source material that ismost useful in a study. Using the Block Manager part of the Biomaterialapplication, the user may prioritize the blocks according to a method ofthe user's choosing, and may apply a Sentinel number of 1-4 to each one.For example, in an autopsy tissue study, blocks with a Sentinel 1 labelmay designate the tissue blocks that best represent a disease process(e.g., cancer) in a patient, and which should be included for analysisin all aspects of the study. Sentinel 4 blocks may designate thoseblocks that are rarely included for analysis in the study, because theydo not contain cancer, or because they are of relatively poor quality,and Sentinel 2 and 3 blocks may be blocks that fall in between these twoextremes. Since many users of the DA will have an inventory of tissueblocks that need to be entered into the database in order to be used,the data fields are defined in a way that allows entry of a block datafrom any source, maintaining the block's originally assigned number, butalso giving the block a new unique block id.

[0070] Often in research studies, thin sections are cut from a block oftissue, placed on a glass slide, stained, and coverslipped to allowmicroscopic analysis. Microscopic images of such tissue slices may bedigitized and stored in the database by the DA. Referring to FIG. 14,block data may be searched using a variety of search criteria 1402, andlinks to block data may be displayed 1404. The DA provides a method tomaintain the relationship between the block of tissue and all the slidescreated from the block. For example, the first slide cut from a tissueblock may be designated slide 1, the second slide 2, and so on, and maybe link to the tissue block data.

[0071] Referring to FIG. 15, when the user searches for data from aparticular block, links to all the pertinent data related to the tissueblock, including slide images of tissues 1502 may be displayed. Byclicking on a link to a particular slide 1512, all of the informationavailable about that slide, e.g., how it was stained, its magnification,may be retrieved 1508. If one or more images have been collected for aparticular slide, an arrowhead 1510 appears next to the slide name. Byclicking on the arrowhead, the available images may be displayed as oneor more links or thumbnail images below the slide. If the user clicks onone of these links or thumbnail images, the image of the selected slide1506 appears in the right hand frame, with all associated data from thatimage 1508. The image is displayed in a zoomable format and the user mayzoom in or out on parts of the image.

[0072] Each tissue block may be designated with a unique block id, anddata for single or multiple blocks may be entered into the biomaterialstemplate. Referring to FIG. 16, a user may select whether to enter datafor a single block, or multiple blocks, or to reserve block id's fortissue samples 1602. After making this selection, the user is promptedto enter the relevant data into various input fields 1604. Collection ofdata related to body-fluid samples of any type is similarly enabledthrough input fields displayed to the user.

[0073] Barcode labels for blocks, slides, and body-fluids may be printedas in the Equipment and Supply template. Barcode data may be scanned infrom any computer running the DA, directly into the common barcode inputarea in the header, and the data page for the item scanned may bedisplayed to the user.

[0074] Protocols

[0075] The Protocols application may be used to combine, link, anddocument relationships among data related to equipment, supplies,contacts, biomaterials, procedures, and activity in a laboratory.Moreover, the Protocols application may be used to analyze existing dataand to add new data to the database, while maintaining existingconnections, links, and relationships among data. Protocols may be usedas vectors to documents and links stored in the database, includingequipment, supplies, and reagents used, authors of a publication,performers of a study, genes studied, images of biomaterials used, genefunction studied, molecular (genetic) data, and reagents and dataresulting from the performance of a laboratory procedure. Essentially,protocols track laboratory activity and document the links amongdifferent laboratory data as the links are created through laboratoryactivity by researchers. This methodology allows tracking of all datafrom the final interpretive phase (e.g., a publication of the results ofa study) back to the initial phases of the study through one or manysuch vectors. This method lends itself to visual representation, andsimplifies the process of tracking, comparing, re-enacting, andvalidating procedures used to obtain data. Moreover, it permits outsideresearchers to use raw data obtained in a first study for the purposesof performing a second study, in which different questions may be posedand answered than were investigated in the first study. For example, afirst research lab may investigate the effect of ultraviolet lightexposure on the molecular structure of skin cells and on melanomaincidence. In performing the study, the first research lab may recordthe tobacco use history of research study participants and record thesedata in the DA. If a second researcher wishes to investigate the effectof tobacco use on melanoma incidence or on the genetic mutations in skincells, the second researcher may use the data stored in the database bythe first research lab, even if the relevant data were not published inthe first research lab's report of the final results of its study.

[0076] The Protocols application provides a modular process fordocumenting laboratory procedures by showing links among laboratorydata. The following is an example of creating a protocol for documentingsequencing of the TP53 gene in a laboratory. Referring to FIG. 17, afterclicking on the protocols link on the front page 200 of the DA to beginthe Protocols application, the user may select the “Add New” radiobutton 1702 in the protocol application left frame 1704. This brings upa menu in the right frame 1706, in which the user is prompted to selecta protocol category, insert a protocol name and description, andindicate the final product of the execution of the protocol from aseries of radio buttons including “Policy Information,” “MaintenanceProcedure,” “Non-Tissue Reagent,” “Tissue Reagent Blocks,” “TissueReagent—Slides,” “Tissue Reagent—DNA,” “Tissue Reagent—Proteins,” and“Molecular Data” (not shown). This radio button selection determineswhat subsequent menus and templates will be provided to the user.

[0077] The Protocols application main page contains a search menu forexisting protocols in the left frame 1702, and the right frame 1706defaults to show all protocols that have been started previously, butwhich have not been completed and which are currently listed as “InProgress.” The user may select an existing protocol from the list inright frame 1706 by clicking on a hyperlink to the protocol. Afterselecting a particular protocol, a protocol control panel (left frame)1802 and protocol body (right frame) 1804 for the protocol selected maybe displayed as shown in FIG. 18. For each protocol in the database, auser may select authors (those who wrote the protocol or contributedsignificantly to writing the protocol) 1806, performers (those who aredoing the work in the lab) 1808, and reviewers 1810, using the protocolcontrol panel 1802. The review function is discussed further below. Theuser may then enter any external authors of the study (authors not inthe contact database), and may then proceed to use the search boxes inthe Protocol control panel 1802 to search for and select Equipment,Supplies, Primers (for polymerase chain reactions), Non-Tissue Reagents,and Tissue Reagents to be used in the protocol. Data which have beenpreviously entered into the database may be displayed in the protocolbody 1804, so that the user may select the data for inclusion in theprotocol by clicking on a link to the data. The user can also scanbarcoded items into the barcode area in the header frame to add theseitems to the protocol.

[0078] With each search for items to add to the protocol, the user isprovided a response screen in the right frame. The user may check offboxes for items desired to be identified with the protocol and may thensave the choices in order to add the items to the protocol body in theright frame, which may then reappear with the updated information. Theuser may then add text to the protocol as shown in right frame 1902 ofFIG. 19 to document the steps followed in the protocol.

[0079] If the user selects “Molecular Data” as the product of aprotocol, the “Add Genomic Data” radio button 2002 is provided to theuser in the left-hand Protocol Control Panel as shown in FIG. 20. Byclicking this button, the user is taken to a Genomic Information panel2100 as shown in FIG. 21, in which the user may search for the gene namefrom a list of official gene names and associated information from theNational Center for Biotechnology Information (NCBI) or another geneticinformation database 2102. Once an officially named gene has beenselected, the user may select from a list of previously identified genefunctions contained within the database for the gene. If the genefunction under study is not listed, the user is prompted to enter thegene function, and an official NCBI PubMed ID for a journal referencesupporting this function for this gene is provided to the user. The userthen indicates the biomolecule under study, from a list including DNA,RNA, cDNA, protein, lipid, and carbohydrate (when an experiment studiesmore than one of the above, the user is asked to choose one primaryfocus of the study). If the user chooses DNA, for example, a list oftypes of DNA-based genomic assay data is provided, which may includesequence information, single strand conformational polymorphism,comparative genomic hybridization, and loss of heterozygosity data.

[0080] If, for example, the user chooses sequence information, a “SampleSequence Information” panel 2202 may appear in frame displayed to theuser, as shown in FIG. 22. This panel may provide a template for precisespecification of the sequencing data to be provided. Selections that maybe required as inputs may include sequence origin(nuclear/mitochondrial), type (exonic/intronic/non-exonic, non-intronicregulatory, and non-regulatory, non-intragenic), the exon or intronunder study (if this radio button was chosen), the number of base pairs(BP) to be recorded, whether the sequence to be provided is from thecoding or non-coding strand, the sequence itself in standard 5′→3′orientation (which is input as text data, with the user supplying theexact number of bases specified), an area to attach images from whichdata are derived, the image type to be added, the tissue reagent fromwhich the specific data are derived, the GenBank Primary AccessionNumber(s) for a specific genomic reference sequence, and the requesteddata input of reference sequence from NCBI Locus Link, which is anannotated subset of a standard reference, such as the GenBank data, andwhich may be more reliable and less changeable as a reference sequence.

[0081] Single or multiple sequences can be added to a protocol,depending on the number of samples under study. The user may be requiredto provide sequence data for each Tissue Reagent entered into theprotocol so as to encourage complete data reporting on each protocol.Below the “Sample Sequence Information” submenu, a “Sequence MethodControl Information” submenu 2204 is provided, in which the user mayprovide data from control reactions validating sequencing reactionsperformed in the laboratory with the equipment and supplies listed inthe protocol.

[0082] Images attached to the protocol are referenced next to the datathey represent, such as the sequence data 2206, shown in alphanumericform, and may also be shown in graphical form. Virtually any type ofimage can be attached to a protocol (and contained within the databasein binary form). For example, a gel image demonstrating the PCR productsprior to the sequencing of the gene, tables of laboratory data, ormicroscopic images of tissue samples may be attached to a protocol. Suchimages may be annotated with text data indicating what is in each lane,or other explanatory text about the image, prior to storing in thedatabase.

[0083] Depending on the type of data generated by a protocol, after thedata are entered, the protocol performer or reviewer may be asked tointerpret the data based on a common format available for that datatype.For example, for sequence data, if a mutation is detected, a reviewermay be asked to annotate the sequence data with the position, basechanges, and predicted amino acid effect of a mutation. This informationmay then be recorded in a format as consistent as possible with existingformats for mutation reporting. When recording loss of heterozygositydata, the presence of loss or gain may be recorded.

[0084] When Tissue Reagents such as purified DNA are products of alaboratory protocol, the user is prompted to provide data on each ofthese reagents, and is given the opportunity to print unique barcodelabels for each reagent created. A Tissue Reagent Search Screen iscontained in the protocol control panel 1802. Tissue reagents may becategorized into various types to simplify searching, and their location(freezer, refrigerator), source protocol, and other metadata may besearched to locate reagents. A similar search panel is available forNon-Tissue Reagents.

[0085] The Protocols application may be used to analyze existing dataand to add new data to the database. For example, referring to FIG. 15,images of tissue slides may be stored in the database for later analysisby a reviewer. One or more tissue slide images 1506 may be presented toa reviewer along with a nested set of questions regarding, for example,the overall interpretability of the image, whether the tissue containscancer, and the Gleason grade of the tissue sample. The reviewer'sanswers to these questions may then be integrated into the database. Bypermitting the review of images through a web-based application, theProtocols application permits remote data analysis by widely distributedresearchers. By presenting different reviewers with the same tissuesample images and the same questions, the Protocols applicationstandardizes the analysis of the tissue. Standardization of questionsmay also be implemented in parts of the Protocols application. Forexample, a standard set of questions requesting information relating topatient histories may be developed and stored in the database. Questionsfrom the standard set of questions may then be pasted in to a new orexisting protocol, so that patient histories from the same or differentstudies may be meaningfully compared.

[0086] Data handling algorithms such as the one illustrated above forDNA sequence data are used for each major type of data recorded by alaboratory. As each algorithm is completed, it can be reused by the labor by any other lab that uses the Template Database. Comparative genomichybridization data, loss of heterozygosity data, and single strandconformational polymorphism data may also be identified, stored, andsearched. Each type of data may have many things in common with previoustypes (e.g., the need to store images or the need to put data in contextof the genome), so the handling of new data types becomes easier withcontinued use and development of protocols created with the DA. As userscome across assay and data types that are not currently listed by theDA, an efficient web-based process of specification of the data type tobe handled and the delivery of sample data may be provided to users, toallow the authority in charge of database structure to incorporate thenew data type expeditiously and thus avoid user frustration.

[0087] Because of their modular design, individual protocols may becopied and used for new lab actions with the least possible repetitivework by the user. As protocols develop, are modified, and slowly“mutate” within labs, small changes may be easily tracked and recordedwithout reworking an entire document.

[0088] The Protocol Manager enables the graphical display of datarelationships and workflow in a laboratory. Because of the methods usedto record and relate information contained in protocols, a pedigree ofprotocol information for any data set can be displayed in a rapidlysearchable, heuristically valuable way. For example, using databaseinformation, a pedigree of protocols involved in the generation of genesequence data may be established, as shown in FIG. 23. Data about apatient or research subject 2302 (e.g., physical character andbehavioral characteristics) may be linked to data about a tissue sampleor a block of tissue samples 2304 (e.g, where in the organ the tissuewas taken, the size of the sample, the date the same was taken, where itwas stored) by a protocol A 2333. The protocol A 2333 may describe, forexample, how tissue samples were prepared from the patient, and whatequipment, supplies, what procedures were used, and who performed theprocedures.

[0089] Data about tissue samples 2304 may be linked to data about tissuesections 2306 (e.g., how many tissue sections exist, whether images ofthe sections exist, and links to the images) by a protocol B 2335. Theprotocol B 2335 may describe how tissue sections were prepared from ablock of tissue sample, and what equipment, supplies, and procedureswere used to prepare the sections, who performed the procedures, wherethe tissue sections were stored, whether images of the tissue sectionsexist, and links to the images.

[0090] Data about tissue sections 2306 may be linked to data aboutpurified genetic material 2308 (e.g., what genes were studied, andwhether the genes are listed by a nationally known database) extractedfrom the tissue sections by a protocol C 2337. The protocol C 2337 maydescribe, for example, how the genetic material was extracted from thetissue samples, who performed the extractions, and what equipment andsupplies were used.

[0091] A protocol D 2339 may link purified genetic material data to dataabout amplified genetic material 2310 (e.g., how much material isproduced and stored, and the quality of the material). The protocol D2339 may describe, for example, the methods of amplifying the geneticmaterial, and who performed the procedure. A protocol E 2341 may linkamplified genetic material data to sequence data (e.g., the basesequences of genes, mutations observed, images of gel data, etc.) 2312.The protocol E 2341 may describe, for example, how the genetic materialwas sequenced, what procedures were used, and who performed them.

[0092] A protocol F 2343 may link sequence data 2312 to data about apublication of research results 2314 (including, for example, a link tothe publication, authors' biographical material, and links toreferences). The protocol F 2343 may describe decisions made in thepublication process that are not evident from the publication itself(e.g., what data were omitted from the final publication, conclusionsthat were made that did not withstand peer review).

[0093] This linked, hierarchical graphical display of linked data allowsresearchers to track anomalies or other issues related to complexgenomic data. Using this data construct, protocols become the abstractinformational “vectors” that link physical objects such as tissue andequipment with abstract research results.

[0094] When the status of a protocol is completed by a protocolperformer, if reviewers are listed for the protocol, a first reviewer(Reviewer1) may be immediately notified (e.g., by email) that theprotocol is completed and a URL may be provided within the email messagefor the reviewer to connect to the review page for the completedprotocol. Not all protocols require review, and assignment of reviewersmay be based on policies adopted in individual laboratories. Forexample, protocols for creation of standard laboratory buffers may notneed to be reviewed routinely, but the review of all data-producingprotocols, first by the performer of the protocol, then by the labmanager, and finally by the lab director may be mandatory.

[0095] In the review function, the designated reviewer is electronicallypresented with a review page containing the completed protocolinformation in a top frame of the page, and with a series of protocolreview questions to answer in the lower frame of the page. Protocolperformers decide when a protocol is complete and when to begindatabase-enabled review of an individual protocol. The first phase ofthe three-phase review process pertains to evaluation of the quality ofcertain basic protocol data, and is applicable to all protocols enteredinto the database. The reviewer may be required to answer whether theauthorship and/or performer names are correct, whether the protocolcategory is correct, whether the protocol name is accurate, whether theprotocol text is clear, whether the equipment/supplies included arecorrect, whether the NTR (non tissue reagent) selection is correct,whether the TR (Tissue Reagent) selection is complete, whether embeddedprotocols are correct, whether or not the protocol represents the stateof the art for the home laboratory, and to review the overall qualityrating of the data in the protocol.

[0096] The second and third phases of the protocol review processpertain only to protocols producing molecular data, for example, genomicsequence data, mRNA or cDNA expression data, comparative genomichybridization data, immunostaining data. The second review phase focuseson evaluation and annotation of data of the specific molecular datatypeproduced in the protocol under review. For example, if the protocol is asequence data-producing protocol, for each set of sequence data attachedto a specific TR, the user is required to declare whether a mutation ispresent, and if so, what type (based on a validated list of mutationtypes), and to designate the specific genomic, cDNA, and proteinposition and real or predicted alteration in sequence caused by thischange, in addition to a confidence level in this data, and notes on thespecific sequence data. In the third phase of the protocol reviewprocess, the user is presented with all molecular data for which thesecond review phase has been completed from the database for thespecific TR's, gene, and gene functions studied in the current protocol.The reviewer then makes a “functional genomic” determination for each ofthe genes and gene functions under study based on a cognitive summationof these previously reviewed data, including those data in the currentprotocol. In making a functional genomic determination the reviewercategorizes the function of the gene under study as “normally on,”“normally off,” “abnormally on,” “abnormally off,” “|off|,” “|on|,” orto states the data are inadequate to determine the gene's function.“Normally on” means that a gene (overall) or gene function isdemonstrably and preponderantly on in user-specified reference normaltissue during a normal day in an adult organism, and is demonstrably onin the experimental tissue under study. “Normally off” means that thisgene function is demonstrably and preponderantly off in user-specifiedreference normal tissue during a normal day in an adult organism, and islikewise demonstrably off in experimental tissue under study.“Abnormally on” means that this gene function is demonstrably andpreponderantly off in user-specified reference normal tissue during anormal day in an adult organism, but is demonstrably and preponderantlyon in experimental tissue under study. “Abnormally off” means that thisgene is demonstrably and preponderantly on in user-specified referencenormal tissue during a normal day in an adult organism, but isdemonstrably and preponderantly off in experimental tissue under study.|On| means that the gene function is demonstrably and preponderantly onin the experimental tissue under study but that no comparisonuser-specified normal tissue is available, or the normal tissue data isuninterpretable. |Off| means that the gene function is demonstrably offin the experimental tissue under study but that no comparisonuser-specified normal tissue is available or the normal tissue data isuninterpretable. “No data” means no interpretable data are available fora specific |gene status| or gene function. Separate determinations ofgene function are made for each tissue block sample. If there is adiscrepancy in results between samples in a given patient, the revieweris asked to make the best summary judgment, but the fact that there is adiscrepancy is separately recorded for this summary data point. Thereviewer rates his or her confidence in the determination as low,medium, or high, and the rating is recorded with the reviewer'scategorization. All base data are considered when making such adetermination, but a reviewer may accord more or less weight toparticular data, based on the quality of the base data. The reviewer isthen asked again to make the same judgment of the gene's status, in a“Fully Integrated” genomic context, where the reviewer is presented witha menu of all other pertinent genomic data in the database for thatparticular sample and/or group of samples.

[0097] This functional genomic summarization process is carried outfirst at the TR level, and then ascends to a Sentinel Subgroup level (ifthe user has specified Sentinel Subgroups for the Tissue Reagents), andfinally at the entire Study Subject level. At each level ofsummarization, the user is asked to summarize the abnormality where anyabnormality has been detected. Upon completion of the review by thefirst reviewer, the user declares the review complete, and if a secondreviewer is listed on the protocol, the second reviewer is automaticallynotified.

[0098] For example, if comparative genomic hybridization (CGH) data werecontained in the database in addition to the sequence data attached to aparticular protocol when the protocol passes the first stage of thereview process, and after the reviewer completes the per sample and perpatient judgment based on the protocol data alone, the reviewer is thenpresented with a list of other relevant genomic data types available foreach sample and each patient. “Relevant” data types for a specificTissue Reagent sample are any data that are tied to the same gene and/orgene function under study, or, for positional data, such as CGH data,any data available from the same chromosome, or region of the chromosomeif the exact relative position of a gene is known.

[0099] Linkage of gene function data and positional data methodology isdiscussed below. The reviewer may click on each of the links to each ofthe data types listed, consider all the data together, and again come toa “best judgment” of whether the specific gene (as a whole) and/or genefunction is, for example, normally on or normally off. If, for example,in a protocol under review, a mutation is detected and predicted totruncate a particular protein and render it totally nonfunctional, thereviewer would judge the gene and all known gene functions to be“abnormally off” because the gene for making the protein is sometimes onduring a normal day in a cell derived from the tissue of origin, but thedata would not be certain because it is possible that anothernon-mutated copy of the gene is present in the cell, and was notdetected in the sequencing reaction. If, however, the reviewer considersthe sequence data along with CGH data from chromosome 17 indicating lossof copy number in the region of the gene in the same tissue reagentsample, the confidence level that this gene is in fact abnormally off inthe tissue under study goes up dramatically, and the reviewer wouldlikely record the summary data for the gene (overall) and specificfunctional data as “high confidence for being abnormally off,” where theconfidence level would probably only have been “medium” if only thesequence data or only the CGH data were considered in isolation.

[0100] Tissue removed from patients as part of biopsy, surgery, andautopsy procedures provides valuable information about diseasedbiological systems. Such tissue-based studies are useful complements toin vitro and laboratory animal research on a given disease, because ofwell-known differences in biological behavior between species and wildlydifferent behavior of cells in vitro compared to in vivo. Use of afunctional genomic symbology does not obviate interpretation usingcurrent methods. The mechanism for recording this symbology is designedto be changeable when new data are added, or if old data need to bereinterpreted. The protocol application permits the extraction offunctional data from static tissue, the opportunity for comparinggenomic function over time, in that the user may choose to study tissuesobtained at times in a patient's life, or tissues obtained before,during, and/or after, specific interventions in a patient or laboratoryanimal. Finally, by providing a common format for describing results,the protocol application allows the comparison of results acrossstudies, such as, for example, from one cancer to another.

[0101] Once a single reviewer has marked a review as completed andsubmitted it to the database, the next reviewer (if any) is notified(e.g., by email) and similarly reviews and comments on the findings andon the previous review if indicated. Each reviewer has the ability tochange the previous functional interpretation of the data if he or shedisagrees with the interpretation. Thus, the most senior or mostexperienced personnel in a lab most likely is the last intralabreviewer, similar to implicit current practice. When the last reviewerhas declared a review complete, the protocol itself and the attachedbase data (images, interpretations of images) becomes fixed and cannotsubsequently be changed. However annotations to protocols are allowed ina dated format, and may be appended at the end of each protocol with thename and date of each entry. Also, as stated above, the functionalgenomic interpretation of data within a protocol can be changedsubsequently, based on reinterpretation of data.

[0102] Specific functions for most human genes at present are unknown,and for the remainder are only partially known. The DA is structured toenable parallel growth in understanding and knowledge of gene functionin normal and abnormal circumstances by prompting thescientist-submitter to identify all results according to specific namedgenes (using accepted names), and, when possible, to provide referencedgene function definitions. Where no gene functional attributes are knownor hypothesized, the data are linked only to the gene name, and imputedaffects on function (e.g., “normally on,” “normally off”) arecategorized by |gene status| alone. This methodology allowsrecategorization of |gene status| data at a later time when specificgene functions are known.

[0103] Where a gene is known but does not yet have an official name, theuser may be required to submit the gene to the appropriate authorities(e.g., human genome organization (HUGO) or the NCBI) prior to enteringdata related to it in the database. Frequent downloads of the acceptedgene names allow a new gene to appear in the database within a shorttime if naming authorities respond quickly.

[0104] Referential linkage of gene-specific data to positional data mayoccur through combination of the latest and most accurate Gene Mappingdata (e.g., GeneBridge 4 from Genome Maps 1999) and overall map lengthfor a given chromosome. For example, gene tp53 is located at 54.01 cR(centiRays) on the GeneBridge 4 map. The total length of chromosome 17on this map is 544.07 cR. The relative position of tp53 the map is thus54.01/544.07 or 0.1 FLpter (fractional length from p-arm telomere).Comparative genomic data is also reported in position relative to thetotal length of the chromosome, and these data can also be converted toFlpter values. Thus, if a gene has been mapped, a common position formatis available for consideration of both gene-specific and positionaldata, such as loss of heterozygosity. This methodology is only asaccurate as the maps used to generate the data. However, even with someimprecision in map locations, the ability to relate gene position toother purely positional data in a programmatic way may be very useful.

[0105] The structure of the database puts equal emphasis on “normal” and“abnormal” data, and integrates these two categories of data, ratherthan reporting only isolated “abnormal” data. The integration of normaland abnormal data, could be instrumental to the success of researchefforts particularly in cancer because it is the “normal” pathways incancer cells for which there is no redundancy that may be most sensitiveto pharmaceutical or other inactivation. In other words, normal cellshave multiple redundant pathways for many cellular functions. Cancercells are likely to lack such redundancy for some or many pathways.Knowledge of which pathways remain normal and have no backup in a cancercell may provide the best targets for improved therapy.

[0106] Scientific Query Builder

[0107] The Scientific Query Builder supports extraction of key datavariables from the database through inherent human pattern recognitionabilities, and through computer-encoded pattern recognition algorithms.Essentially, the Query Builder allows specific detailed data and/orsummary data to be queried from the entire database, and presents thedata in a graphical format linked to all underlying data. For example, auser may query the database for all data from all studies in thedatabase relating to prostate cancer patients who ever smoked cigaretteswith abnormalities in either the p53 or PTEN genes, and the user mayfurther specify that data of the patients' initial bx (biopsy) Gleasonscores and weight loss or gain status should be displayed. A subscriberto the database may run the query over all the data in the database toreturn the results from all data stored within the database, along withlinks to protocols used to produce the data, and links to underlyingdata from which the returned data are derived.

[0108] Continuing with the example of the search for data related tosmokers with prostate cancer and p53 or PTEN gene abnormalities, thefunctional genomic data returned from the query may be color coded forhigh information content viewing of the data. For example, the conditionof a patient's gene functionality may be represented by a rectangulardatacell symbol, which may be black to indicate abnormal gene functionand green to represent normal gene function. If the gene function isoff, the perimeter of the rectangle may be red, and if it is on, theperimeter of the rectangle may be black. Representation of Gleasonscore, weight loss, tobacco use, and other data may be similarly encodedto provide maximum information content to the viewer in a small spatialarea. While examining the data, when the user becomes interested in acertain pattern within the data and wishes to examine the dataunderlying it, simply clicking on the datacell representing the data inquestion leads to the summary data supporting. Summary data may indicatehow the data represented by the data cell were acquired (e.g, whatprotocols were used to produce it) and what the supporting data for thedata represented by the datacell are. The user can go even farther ifdesired, and view the protocol used for the data shown, and anyprotocols intermediate between collection of the tissue and generationof the data, perhaps first in pedigree form, and then drilling furtherto view a specific protocol. Thus, the user may review and analyze theresults of specific queries.

[0109] Statistical analysis of the extracted data is enabled byfunctionality provided by the DA. For example, the application mayprovide summaries of numbers of samples within each category, withpercentages or other statistical measures. Such data may be presented ina format suitable for export to spreadsheets for further analysis asrequired. Output of data such as repeated blood measurements over timeplotted on a graph or time intervals for specific treatments may also beenabled and may be displayed side by side with molecular data. Moleculardata types that are not in the “functional” format, such as specificloss of heterozygosity or mutation data, may also be displayed.

[0110] The Project Manager, Patients, Contact Manager, Equipment andSupplies, Biomaterials, Protocols, and Scientific Query Builderapplications are designed for daily use in a laboratory setting. Thelaboratory may determine what data are input into the database, and howthey are reviewed and maintained. The laboratory may allow peopleoutside the lab to view data in one of two ways. Collaborativeagreements between the laboratory and an outsider may give authority toan outsider to use the DA to view and/or add to and/or search selectedor all data within the laboratory's local database. Specific patientsmay be allowed to view their own research data in appropriate settings(i.e., with researchers or others able to discuss the data), which mayprovide an incentive for patients to participate in research. Thelaboratory may also publish datamorphs from the database. Whenpublishing a datamorph, the specific results of a query with allunderlying table data may be saved as a single unit (datamorph). Anintroduction and discussion of the data and appropriate references maybe appended, and the entire “completed datamorph” may be submitted forconsideration to a publication database. Once published, all datarelationships within the datamorph are maintained and the ability tosearch within and among datamorphs is enabled.

[0111] The DA also enables sharing of materials in addition to sharingof information. Laboratories often share reagents, and oftenlaboratories spend a large amount of time in administering such sharing,without reimbursement and without any clearly organized structure orlogging process. Thus, the Biomaterials application may allow a lab topublish the availability of specific reagents that other laboratoriesmay request and gain approval to obtain. Payment and shipping of thereagents may occur in a similar fashion using existing web commercemethods.

[0112] Yet another emergent property of the system is that in many waysit may lessen inequities in giving scientific credit for work performed.In the current system, first and last authors of papers are often givendisproportionate credit for work done. With the system described here,during each step of the project manager, equipment and supply,biomaterials, protocol, and review process, each person's contributionto the research project is recorded accurately and accounted for.

[0113] Use of the DA by laboratories conducting scientific research onhuman patients may provide an opportunity to create a constructive wayfor patients to become more aware and informed about the research.Having patients interested, involved, and knowledgeable about theresearch process, as it affects them, may help advance scientificresearch and help our culture adapt better to new knowledge andtechniques for diagnosing and treating disease.

[0114] The techniques, methods, and systems described here may findapplicability in any computing or processing environment. Variousimplementations of the systems and techniques described here may berealized in digital electronic circuitry, or in computer hardware,firmware, software, or in combinations thereof. A system or otherapparatus that uses one or more of the techniques and methods describedhere may be implemented as a computer-readable storage medium,configured with a computer program, where the storage medium soconfigured causes a computer system to operate on input and/or generateoutput in a specific and predefined manner. Such a computer system mayinclude one or more programmable processors that receive data andinstructions from, and transmit data and instructions to, a data storagesystem, and suitable input and output devices. Each computer program maybe implemented in a high-level procedural or object-oriented programminglanguage, or in assembly or machine language if desired; and in anycase, the language may be a compiled or interpreted language. Suitableprocessors include, by way of example, both general and special purposemicroprocessors.

[0115] Generally, a processor will receive instructions and data from aread-only memory and/or a random access memory. Storage deices suitablefor tangibly embodying computer instructions and data include all formsof non-volatile memory, including semiconductor memory devices, such asEPROM, EEPROM, and flash memory devices; magnetic disks such as internalhard disks and removable disks; magneto-optical disks; and CD-ROM disks.

[0116] These elements also can be found in a conventional desktop orworkstation computer as well as other computers suitable for executingcomputer programs implementing the methods described here, which can beused in conjunction with any content viewing or manipulation software,or any other software capable of displaying portions of a larger body ofcontent. Any of the foregoing may be supplemented by, or implemented in,specially designed ASICs (application specific integrated circuits).

[0117] Referring to FIG. 24, a computer system 2400 represents ahardware setup for executing software that allows a user to performtasks such as store, viewing, editing, retrieving, and downloadingresearch results and data upon which the research results are based—thatis, any combination of text, images, numbers, hyperlinks, and links toother objects. The computer system 2400 of FIG. 24 may also beprogrammed with computer-readable instructions to enable data to beperceived as stored, viewed, edited, retrieve, downloaded, and otherwisemanipulated.

[0118] The computer system includes various input/output (I/O) devices(mouse 2403, keyboard 2405, display 2407) and a general purpose computer2400 having a central processor unit (CPU) 2421, an I/O unit 2417 and amemory 2409 that stores data and various programs such as an operatingsystem 2411, and one or more application programs 2413. The computersystem 2400 preferably also includes some sort of communications card ordevice 2423 (for example, a modem or network adapter) for exchangingdata with a network 2427 via a communications link 2425 (for example, atelephone line).

[0119] A number of implementations have been described. Nevertheless, itwill be understood that various modifications may be made. Accordingly,other implementations are within the scope of the following claims.

What is claimed is:
 1. A method of integrating and distributing researchdata, the method comprising: receiving research data selectivelyextracted from a local database, the research data including dataconcerning research results and data upon which the research results arebased, wherein the data concerning research results are linked to thedata upon which the research results are based; storing the extractedresearch data in a common database; and distributing to a user of thecommon database research data from the extracted research data.
 2. Themethod of claim 1, wherein links between the data concerning researchresults and the data upon which the research results are based tracklogical relationships between the data concerning research results andthe data upon which the research results are based.
 3. The method ofclaim 1, wherein the data upon which the research results are basedinclude phenotype data and genotype data.
 4. The method of claim 1,wherein the data upon which the research results are based includeinformation concerning equipment and supplies used in generating theresearch results.
 5. The method of claim 1, wherein the data upon whichthe research results are based include information concerningbiomaterials used in generating the research results.
 6. The method ofclaim 1, wherein the data upon which the research results are basedinclude information concerning protocols used in generating the researchresults.
 7. The method of claim 1 further comprising: providing areviewer with access to the research data stored in the common database;receiving an input from the reviewer; and labeling selected researchdata stored in the common database as approved by the reviewer forpublic distribution from the common database based on the input from thereviewer.
 8. The method of claim 1, wherein the distributed researchdata are selected for distribution by the user.
 9. The method of claim1, wherein distributing the research data comprises distributing theresearch data in a defined database table structure.
 10. A system forintegrating and distributing research data, the system comprising: aprocessor; and memory adapted for storing instructions performed by theprocessor for: receiving research data selectively extracted from alocal database, the research data including data concerning researchresults, data upon which the research results are based, and informationconcerning protocols used in generating the research results, whereinthe data concerning research results are linked to the data upon whichthe research results are based and to the information concerning theprotocols; storing the extracted research data in a common database; anddistributing to a user of the common database research data from theextracted research data.
 11. The system of claim 10, wherein the linksbetween the data concerning research results and the data upon which theresearch results are based track logical relationships between the dataconcerning research results, the data upon which the research resultsare based.
 12. The system of claim 10, wherein the data upon which theresearch results are based include phenotype data and genotype data. 13.The system of claim 10, wherein the data upon which the research resultsare based include information concerning equipment and supplies used ingenerating the research results.
 14. The system of claim 10, wherein thedata upon which the research results are based include informationconcerning biomaterial used in generating the research results.
 15. Thesystem of claim 10, wherein the data upon which the research results arebased include information concerning protocols used in generating theresearch results.
 16. The system of claim 10, wherein the memory isfurther adapted for storing instructions performed by the processor for:providing a reviewer with access to the research data stored in thecommon database; receiving an input from the reviewer; and labelingselected research data stored in the common database as approved by thereviewer for public distribution from the common database based on theinput from the reviewer.
 17. The system of claim 10, wherein the memoryis further adapted for storing instructions performed by the processorfor distributing research data selected for distribution by the user.18. The system of claim 10, wherein the memory is further adapted forstoring instructions performed by the processor for distributing theresearch data in a defined database table structure.
 19. A computerprogram, residing on a computer-readable medium, for integrating anddistributing research data, comprising instructions for causing acomputer to: receive research data selectively extracted from a localdatabase, the research data including data concerning research resultsand data upon which the research results are based, wherein the dataconcerning research results are linked to the data upon which theresearch results are based; and store the extracted research data in acommon database.
 20. The computer program of claim 19, wherein the linksbetween the data concerning research results and the data upon which theresearch results are based track logical relationships between the dataconcerning research results and the data upon which the research resultsare based.
 21. The computer program of claim 17, wherein the data uponwhich the research results are based include phenotype data and genotypedata.
 22. The computer program of claim 17, wherein the data upon whichthe research results are based include information concerning equipmentand supplies used in generating the research results.
 23. The computerprogram of claim 17, wherein the data upon which the research resultsare based include information concerning biomaterial used in generatingthe research results.
 24. The computer program of claim 17, wherein thedata upon which the research results are based include informationconcerning protocols used in generating the research results.
 25. Thecomputer program of claim 17, further comprising instructions forcausing a computer to: provide a reviewer with access to the researchdata stored in the common database; receive an input from the reviewer;and label selected research data stored in the common database asapproved by the reviewer for public distribution from the commondatabase based on the input from the reviewer.
 26. The computer programof claim 17, further comprising instructions for causing the computer todistribute from the common database research data, which are selectedfor distribution by the user.
 27. The computer program of claim 17,further comprising instructions for causing the computer to distributethe research data in a defined database table structure.